nlp_architect.pipelines package

Submodules

nlp_architect.pipelines.spacy_bist module

class nlp_architect.pipelines.spacy_bist.SpacyBISTParser(verbose=False, spacy_model='en', bist_model=None)[source]

Bases: object

Main class which handles parsing with Spacy-BIST parser.

Parameters:
  • verbose (bool, optional) – Controls output verbosity.
  • spacy_model (str, optional) – Spacy model to use
  • https ((see) – //spacy.io/api/top-level#spacy.load).
  • bist_model (str, optional) – Path to a .model file to load. Defaults pre-trained model’.
dir = PosixPath('/home/peter_nlp/nlp-architect/cache/bist-pretrained')
parse(doc_text, show_tok=True, show_doc=True)[source]

Parse a raw text document.

Parameters:
  • doc_text (str) –
  • show_tok (bool, optional) – Specifies whether to include token text in output.
  • show_doc (bool, optional) – Specifies whether to include document text in output.
Returns:

The annotated document.

Return type:

CoreNLPDoc

to_conll(doc_text)[source]

Converts a document to CoNLL format with spacy POS tags.

Parameters:doc_text (str) – raw document text.
Yields:list of ConllEntry – The next sentence in the document in CoNLL format.

nlp_architect.pipelines.spacy_np_annotator module

class nlp_architect.pipelines.spacy_np_annotator.NPAnnotator(model, word_vocab, char_vocab, chunk_vocab, batch_size: int = 32)[source]

Bases: object

Spacy based NP annotator - uses models.SequenceChunker model for annotation

Parameters:
  • model (SequenceChunker) – a chunker model
  • word_vocab (Vocabulary) – word-id vocabulary of the model
  • char_vocab (Vocabulary) – char id vocabulary of words of the model
  • chunk_vocab (Vocabulary) – chunk tag vocabulary of the model
  • batch_size (int, optional) – inference batch size
classmethod load(model_path: str, parameter_path: str, batch_size: int = 32, use_cudnn: bool = False)[source]

Load a NPAnnotator annotator

Parameters:
  • model_path (str) – path to trained model
  • parameter_path (str) – path to model parameters
  • batch_size (int, optional) – inference batch_size
  • use_cudnn (bool, optional) – use gpu for inference (cudnn cells)
Returns:

NPAnnotator class with loaded model

class nlp_architect.pipelines.spacy_np_annotator.SpacyNPAnnotator(model_path, settings_path, spacy_model='en', batch_size=32, use_cudnn=False)[source]

Bases: object

Simple Spacy pipe with NP extraction annotations

nlp_architect.pipelines.spacy_np_annotator.get_noun_phrases(doc: spacy.tokens.doc.Doc) → [<class 'spacy.tokens.span.Span'>][source]

Get noun phrase tags from a spacy annotated document.

Parameters:doc (Doc) – a spacy type document
Returns:a list of noun phrase Span objects
nlp_architect.pipelines.spacy_np_annotator.set_noun_phrases(doc: spacy.tokens.doc.Doc, nps: [<class 'spacy.tokens.span.Span'>]) → None[source]

Set noun phrase tags

Parameters:
  • doc (Doc) – a spacy type document
  • nps ([Span]) – a list of Spans

Module contents